A simple and fast method to determine the parameters for fuzzy c-means cluster validation
نویسندگان
چکیده
Motivation: Fuzzy c-means clustering is widely used to identify cluster structures in high-dimensional data sets, such as those obtained in DNA microarray and quantitative proteomics experiments. One of its main limitations is the lack of a computationally fast method to determine the two parameters fuzzifier and cluster number. Wrong parameter values may either lead to the inclusion of purely random fluctuations in the results or ignore potentially important data. The optimal solution has parameter values for which the clustering does not yield any results for a purely random data set but which detects cluster formation with maximum resolution on the edge of randomness. Results: Estimation of the optimal parameter values is achieved by evaluation of the results of the clustering procedure applied to randomized data sets. In this case, the optimal value of the fuzzifier follows common rules that depend only on the main properties of the data set. Taking the dimension of the set and the number of objects as input values instead of evaluating the entire data set allows us to propose a functional relationship determining its value directly. This result speaks strongly against setting the fuzzifier equal to 2 as typically done in many previous studies. Validation indices are generally used for the estimation of the optimal number of clusters. A comparison shows that the minimum distance between the centroids provides results that are at least equivalent or better than those obtained by other computationally more expensive indices. Contact: [email protected]
منابع مشابه
Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملANFIS modeling and validation of a variable speed wind turbine based on actual data
In this research paper, ANFIS modeling and validation of Vestas 660 kW wind turbine based on actual data obtained from Eoun-Ebn-Ali wind farm in Tabriz, Iran, and FAST is performed. The turbine modeling is performed by deriving the non-linear dynamic equations of different subsystems. Then, the model parameters are identified to match the actual response. ANFIS is an artificial intelligent tech...
متن کاملA Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data
The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...
متن کاملADAPTIVE NEURO FUZZY INFERENCE SYSTEM BASED ON FUZZY C–MEANS CLUSTERING ALGORITHM, A TECHNIQUE FOR ESTIMATION OF TBM PENETRATION RATE
The tunnel boring machine (TBM) penetration rate estimation is one of the crucial and complex tasks encountered frequently to excavate the mechanical tunnels. Estimating the machine penetration rate may reduce the risks related to high capital costs typical for excavation operation. Thus establishing a relationship between rock properties and TBM pe...
متن کاملOil Reservoirs Classification Using Fuzzy Clustering (RESEARCH NOTE)
Enhanced Oil Recovery (EOR) is a well-known method to increase oil production from oil reservoirs. Applying EOR to a new reservoir is a costly and time consuming process. Incorporating available knowledge of oil reservoirs in the EOR process eliminates these costs and saves operational time and work. This work presents a universal method to apply EOR to reservoirs based on the available data by...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010